A strong converse for classical channel coding using entangled inputs 
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A fully general strong converse for channel coding states that when the rate of sending classical 
information exceeds the capacity of a quantum channel, the probability of correctly decoding goes 
to zero exponentially in the number of channel uses, even when we allow code states which are 
entangled across several uses of the channel. Such a statement was previously only known for 
classical channels and the quantum identity channel. By relating the problem to the additivity 
of minimum output entropies, we show that a strong converse holds for a large class of channels, 
including all unital qubit channels, the d-dimensional depolarizing channel and the Werner-Holevo 
channel. This further justifies the interpretation of the classical capacity as a sharp threshold for 
information-transmission. 



A fundamental problem in quantum information the- 
ory is the transmission of classical information over 
(noisy) quantum channels. As a simple example, suppose 
we send M classical bits using a qubit identity channel 
n times. Clearly (28|, this can be done reliably if M < n, 
but if the number of classical bits exceeds the number 
of qubits sent (M > n) we are no longer able to re- 
cover the encoded information with perfect accuracy [2§| . 
This situation is analogous to the problem of information 
transmission over a noisy classical channel. Here, there 
exists a constant C, called the classical capacity, which 
determines the maximal number of classical bits that can 
be sent reliably per channel use: by using the channel 
n times, we can reliably transmit M bits if and only if 
the rate R = ^- satisfies R < C in the asymptotic limit. 
This is known as the coding theorem due to Shannon [l[ . 
For example, for the binary bit flip channel, which flips 
an input bit with probability p, this constant is given by 
C = l—h(p), where h is the binary entropy function. The 
unifying concept for both scenarios is that of the classical 
capacity C. For the qubit identity channel Holevo's sem- 
inal result shows that the classical capacity is equal 
to 1. 

In fact, for both the qubit identity channel and any 
classical channel, the classical capacity C imposes a sharp 
bound on our ability to recover classical information sent 
over the channel: On the one hand if R < C, then it 
is possible to send nR classical bits by using the chan- 
nel n times in such a way that the probability P succ of 
successful decoding goes to 1 exponentially as n — ► oo. 
This is also referred to as the achievability of the capac- 
ity. On the other hand, if R > C , then for any encoding 
and decoding scheme, P SU cc is exponentially small in the 
difference n(R — C). This is referred to as the strong 
converse of the coding theorem for these channels. 

For classical noisy channels, the strong converse was es- 
tablished by Wolfowitz [3]. For the qubit identity chan- 
nel id2 = idg(c2), the argument is rather simple: Sup- 
pose we encode a uniformly distributed ni?-bit string 
X e {0, l} nR using a family of 2 nR states {p*}*!* on 
(C 2 )®" (i.e., of n qubits). Then, for any decoding POVM 



{Ex}x"^i on (C 2 )®", the average success probability of 
correctly decoding is bounded by 



—y 

2nR Z-^i 



tr(E xPx ) < -L^tr(^) 



Here, we used the operator inequality p x < I(c 2 )»™ for 
every x, and the fact that the operator elements of a 
POVM sum to the identity. Due to the strong converse 
property, we can regard the capacity C as an exact mea- 
sure of the information-carrying power of any classical 
channel and the quantum identity channel. 

Unfortunately, this appealing operational interpreta- 
tion of the classical capacity C is not quite as complete 
for general quantum channels. While the achievability 
of the capacity has been established in HHI (building 
on [6[), only a weak converse has been shown without 
assumptions [§]. It merely states that for rates R > C 
above the capacity, the success probability is bounded 
away from 1. This is in contrast to a strong converse, 
which shows that this probability goes to zero exponen- 
tially, in the limit as n goes to infinity. 

Here, we are interested in the validity of the strong 
converse property for a general quantum channel. Es- 
tablishing such a converse is more difficult than for clas- 
sical channels for the same reason it is difficult to com- 
pute the classical capacity of a quantum channel: We 
have to take into account the possibility that entangle- 
ment over several uses of the channel may help to increase 
the probability of successful decoding. Indeed, a recent 
breakthrough result by Hastings Q shows that using en- 
tangled states can be advantageous. Formally, this is 
expressed by the product-state capacity C^ rod : This is 
defined in the same way as the capacity, but with the 
restriction that the input states to the channel <E>® ra have 
to be of tensor product form. Hasting's result shows that 
there are channels <& with C^ od < C$. 

In light of the advantage of entanglement for coding, 
it is natural to ask whether entanglement may invalidate 
the strong converse property: In particular, we study 
whether allowing arbitrary (entangled) input states does 
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not affect the exponential decay of the success probabil- 
ity. Previous studies of the region R > C$ were restricted 
to the case where the inputs are not entangled across dif- 
ferent uses of the channel [1,0], and are thus conceptually 
similar to the study of the achievability of the product 
state capacity C£ rod instead of the more general C$ . 



MAIN RESULT 

Here, we prove a strong converse for a large number of 
quantum channels In particular, our result applies to 



(i) the qudit depolarizing channel 



K{p)=rp + {l-r)- 



(1) 



replacing any input state with the fully mixed state 
with probability (1 — r) for — l/(eP — 1) < r < 1, 

(ii) any unital qubit channel [30| . and, more generally, 

(iii) any channel which has additive minimum output 
q-entrqpy S 1 ™ 111 for a > 1 (close to 1) as defined 
below 3l| , and the following covariance property: 
there is a pair of unitary representations of some 
group G on the input space Hi n and the output 
space H ou t, respectively, such that 



g<S>(p)g f = $( 



gpg' 



for all g £ G 



where the representation on H. ou t is irreducible. An 
example of such a channel is the Werner-Holevo 
channel fioj ] . 

More formally, we are concerned with (noisy) quantum 
channels, i.e., completely positive trace-preserving maps 
(CPTPM) $ : B{H- m ) -> B(H out ). Throughout, we re- 
strict our attention to finite-dimensional Hubert spaces 
Hi n and H ou t. A code of rate R for $ specifies (for ev- 
ery n) a family {p x }T=i °f states on W® n , where p x is the 
quantum codeword associated with the classical message 
x G {1, . . . , 2 nR }. A corresponding decoder is a POVM 
{E x }^Li on Hq U ™. We are interested in the average suc- 
cess probability of decoding correctly, that is, the quan- 
tity 

PLc(n, R) = ^Y, ^(E x <S>® n (p x )) . (2) 
x =i 

In this terminology, we show the following: 

Theorem Let $ be a CPTPM described by (i)-(iii), 
and let C$ be its classical capacity. There exists a con- 
stant 7 > such that the following holds: For any code of 
rate R, and any corresponding decoder, the success prob- 
ability Pf ucc (n,R) is upper bounded by 2~ 1 ' n ( R ~ CiS? } (for 
sufficiently large n). 



Thus the success probability decays exponentially 
when coding at rates above the capacity. 

Background Before giving a short overview of our 
proof, let us briefly recall how the study of the achiev- 
ability of rates below the capacity can be subdivided into 
three major components: one begins by setting up a con- 
nection between the operational problem of coding and 
an entropic quantity. More precisely, one can show that 
there exists codes such that the success probability has 
a behavior of the form 

P s t cc (n, J R) = l-e-"^(*)-«), (3) 

with S > for rates R smaller than 



X*($) := lim - x *($®») 

ii — >oq 72 



(4) 



This quantity is the regularized version of the Holevo- 
quantity of the channel <!>, i.e., 

X*($) := max . x({Px, Hpx)}x) , (5) 

which in turn is defined in terms of the Holevo quantity 
of an ensemble {p x , o~ x } x , given by 



x{{Px,o- x } x ) := S I ^2,p x a x J - ^2p x S{a x ) 

\ X / X 



(6) 



This is the first step in the study of the coding problem. 
It reduces the operational problem of coding to the study 
of the quantity ([4}. In particular, ([3]) tells us that we 
can code with exponentially small error at any rate R < 

**(*)■ 

The second component is to study general properties 
of the quantity %*(<!>). The computation of this value is 
drastically simplified in cases where the Holevo quantity 
is additive, that is, 



d>) = x *($ ( 



in-l\ 



(7) 



for all 7i > 1, since this implies x*(*) = X*(®)- Note 
that part of this statement, the so-called subadditivity 



X*($' 



in-l 



$)> X *($®»-1)+ X *($) , 



is trivial, as it corresponds to restricting to product 
states. Showing whether or not |(7J) holds for a given 
channel $ is a called an additivity problem. It has sev- 
eral equivalent formulations: for example, the quantity 
X*(A), for any CPTPM A, can be reexpressed in terms 
of the relative entropy D as 



X*(A) = minmaxL»(A(p)||A(cr)) 

a p 



(8) 



as shown in [llj. The physical significance of the ad- 
ditivity property ((7]) stems from the fact that (HJ) is a 
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formula for the capacity C$, while ([5]) is equal to the 
product state capacity C| rod [33]. Additivity of x* f° r a 
channel $ therefore implies that there is no advantange 
in using entangled states for coding in the asymptotic 
limit. 

Finally, one needs to investigate the additivity problem 
(cf. 0), which is poorly understood in general. King [12 1 
has shown additivity of x* f° r the depolarizing chan- 
nel ([1]). His proof uses the fact that for any covariant 
channel $, the Holcvo quantity is related to the mini- 
mum output entropy [1 31 ] 



by 



5* mm ($) :=njin£(#(p)) 
p 



X*($)=logd ont -5 min ($) 



(9) 



(10) 



where d ou t is the dimension of the output space 7i ut- 
King then establishes the additivity of S min for the de- 
polarizing channel A r by showing that the related min- 
imum a-Renyi-entropies S*™ m (defined below) are addi- 
tive for A r . This implies additivity of x*, an d leads to 
an explicit formula for the capacity Ca t ■ 

Proof outline Our approach to coding at rates above 
the capacity has the same overall structure as the study 
of the achievability explained above. The strong con- 
verse theorem is obtained by (a) relating the decoding 
probability to entropic quantities, (b) rephrasing the re- 
sulting additivity problems and finally (c) showing that 
the channels (i)-(iii) satisfy these additivity properties. 

The relevant quantities in our case turn out to be the 
following Renyi-entropic versions of the above quantities. 
For a > 1, we use 331 



S a (p) 
D a (p\\o) 

Xa({Px,Vx}x) 



1 


1 




a 




1 




a 




1 




a 




a 




1 



tr(p«) 

logtrOf/V 1 -) (11) 

l/a 



^logtr(E^) 



We also need the corresponding derived quantities Xc«(^)> 
X* ($), and S™ in (1>) defined as in © , Q and ® , respec- 
tively. 

We now give a sketch of the proof, following the three 
steps (a)-(c) outlined above (Details can be found in the 
appendix). First, we relate our operational problem to 
the regularized quantity Xa(^) by showing that for any 
code of rate R, we have 

PLc(n,R) < 2-"< 1 -£>< fl -*«<*» for all a > 1 © 

for sufficiently large n. This is the analog of (J3J). It shows 
that for any rate R > X^("3?), the success probability 
decays exponentially with n. 



Clearly, the quantity Xa(^) a g am has a particularly 
simple form if x* a l& additive as in ([7]) . To study additivity 
of the quantity X«(^)i the second step of our proof is to 
derive the following analog of ([8]), essentially following 
the steps of Schumacher and Westmoreland [111 ] 

minmaxD ct (A(p)||cr OU i) < X«( A ) 
"™t p 

< minmaxD ct (A(p)||A(crj r! ,)) . 

(Tin P 

0) 

As before, additivity of the quantity Xa(^) is intimately 
connected to the classical capacity C$: As shown by 
Ogawa and Nagaoka for every e > 0, we have 
Xa(&) < + e f° r all a > 1 in some neighborhood of 1. 
In particular, with (f5lj) . this shows that additivity of x* a 
for all a in the vicinity of 1 implies a strong converse, 
that is, an exponential decay of the successprobability 
for any rates R > C$ . Since it is known 0, Q that cod- 
ing with product states at rates above the capacity leads 
to the same exponential behavior, we can conclude that 
entanglement provides no operational advantage. 

Finally, we show additivity of x* a f° r the special class of 
channels $ satisfying our assumptions (i)-(iii). For these 
channels, the covariance properties imply that both the 
lower and upper bound in ([ST]) coincide and are attained 
when Oin and a ut a re completely mixed. By definition, 
this means that these channels satisfy the Renyi-entropic 
version 



M) 



of (fit))) . Additivity of x* a IS shown by combining 
with (|10ip . as follows. For a in = I/d equal to the fully 
mixed state, we get 

X* a ($® n ) < max^($®>)||$®"((IM„)®")) 
p 

= log^t -S™($® n ) 

= nlogd onf -n-^ nin ($) . (12) 

In the last step, we used the additivity of the minimum 
output a-entropy ff™ 1 " for the channels of interest for 
a > 1 close to 1 (cf. 14] for qubit unital channels, fT"" 



the depolarizing channel, and [HI, nana for the Werner- 
Holevo channel). By the subadditivity property of the 
quantity Xa, we kno w that nx„($) < Xa(®® n )- Com- 
bining this with (jlOip and (JT3J) proves additivity, that is, 
Xa( $ ) = X«(*) = log d out -S , min ($). This concludes the 
proof of our main result. 

CONCLUSION 

In summary, we have shown that for a large class 
of practically relevant quantum channels, the probabil- 
ity of reliably transmitting nR classical bits by n uses 
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of the channel has an asymptotic behavior of the form 
2~in(R-c) f or some constant 7 > when coding at 
rates R above the classical capacity C. Such a state- 
ment was previously only known for classical channels 
and the identity channel. Our result has direct prac- 
tical applications to quantum cryptography, espec ially 
in the so-called noisy-quantum-storage model [la . [l9(, 
where the adversary is restricted to using low-capacity 
channels. For these applications, some knowledge about 
the optimal constant 7 will be useful. Our work provides 
bounds on this value, about which little is known even in 
the classical case. 

On a more fundamental level, our result implies that 
for the quantum channels considered, using entanglement 
provides no advantage in all rate regimes. These channels 
therefore behave just as classical channels with respect 
to the transmission of classical information. Establishing 
strong converses for a wider class of channels is of fun- 
damental importance, as this is the natural counterpart 
of the achievability statement of the capacity. Of partic- 
ular interest in this context are channels whose Holevo- 
quantity is non-additive 0]. While we do not explicitly 
use this fact, the Holevo-quantity is additive for the chan- 
nels considered in this paper. 

Showing that the success probability of decoding has 
an exponential behavior both below and above the capac- 
ity confirms our interpretation of the classical capacity as 
the single relevant measure of the usefulness of a quan- 
tum channel for classical communication. 

We acknowledge support by NSF grants PHY- 
04056720 and PHY-0803371. 
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In this appendix, we provide a detailed proof of the 
strong converse theorem for all channels described by (0- 
(pS)l . Let us first argue that it suffices to consider chan- 
nels of the type ([m]) , i.e., covariant channels. Indeed, 
the d-dimensional depolarizing channel ([T]) is just a spe- 
cial example of (lm|) . since it has additive minimum out- 
put a-entropy [12J and is covariant with respect to the 
unitary group. For unital qubit channels, first observe 
that the quantity X„($) of interest remains unchanged 
when considering a unitarily equivalent channel, i.e., one 
which additionally conjugates the input and output with 
fixed unitaries and U ou t, respectively. It has been 
shown [H Ei| that any one qubit unital channel is uni- 
tarily equivalent to a Pauli diagonal channel 



Hp) = ^otjajpaj 



for some ctj > 0, where {<Jj}j are the Pauli matrices. 
This channel is an instance of ([HTj). since it has additive 
minimum output a-entropy [141 ] and is invariant with re- 
spect to the irreducible action of the Pauli group on C 2 . 
We will therefore restrict our attention to covariant chan- 
nels in this appendix. We now state our main result more 
formally: 

Theorem 1.1 (Strong converse). Let $ : £>(7ii n ) — » 
S(W ou t) be a CPTPM satisfying 

(A) $ is covariant with respect to a pair of unitary rep- 
resentations of a compact group G on 7ii n and TLout, 
where the representation on Ti ou t is irreducible. 

(B) The minimum output entropy S'™ ln ($) is additive for 
a > I (for a close to 1). 

Then the strong converse holds for $, that is, 
Pf ucc (n,R) — > exponentially for any rate R > X*(^) = 
C<j> . 

We assume throughout that the representations of G 
are continuous, and state our proofs for the case where 
G is finite (the general case is analogous, see e.g., [13| for 
details). 

For rates R < C$, the rate of convergence of 
Pf ucc (n,R) — > I for the optimal code and decoder is 
measured by the so-called reliability rate function (see 
e.g., 0) 

E*(R) = lim sup -^(l-Pl cc (n,R)) (u) 

n — >oo Ji 



For R > C<j>, we are interested in the rate at which 
Pf ucc {n,R) — > as n — > oo. In analogy to (TT5|) . we 
introduce the function 



E*(R) = lim inf lo g P *ccK^) _ (w) 



We now make the three main steps (o)-(c) in the proof 
of our theorem more explicit. The following lemma gives 
a bound on (|14p in terms of the regularized a-Holevo 
quantity, and thus connects a-Holevo quantities to the 
operational coding problem. 

Lemma 1.2. For all CPTPMs $ : B{H m ) -► B(H ou t) 



1. The operational quantity (|I 4j) is bounded by the reg- 
ularized a-Holevo quantity as 



E*(R) > 1 



1 



for all a > I 



2. For every R > x*($), there exists (3 = /3(R) > 1 
such that R > Xa(®) f or all 1 < a < ft. 

In particular, E®(R) > for all R > x*(^ ) ) */ X* a * s 
additive for $ for all a > 1 close to 1. 

The proof of this lemma, which is essentially identical to 
a derivation in Q, is given in Appendix II. 2 1 Note that for 
the channels of interest, we have C$ = %*(<!>). Therefore, 
Lemma II.2I reduces the problem of establishing a strong 
converse to the additivity of x*. 

Recall that the second step is to bound the quantity 
Xa in terms of a generalized form of the relative entropy 
for a > 1. 

Lemma 1.3. Let A : B{H in ) -> S(W ou t) be a CPTPM, 
and a > 1. The quantity x a (^) * s related to D a by 

minmaxD Q (A(p)||cr out ) < X„(A) 

Tout P 

< minmaxZ) Q (A(p)||A(<7i n )) . 
fin p 

(15) 

Moreover, if $ : B(H in ) -> B(H out ) is a CPTPM satis- 
fying the covariance property (TA"|) . then 

X* ($) = minmaxD a ($(p)||cr out ) 

"out P 



minmaxD a ($(p)||$(cr irl )) 

CTiJi P 

logtUt-SS^^) , 



(16) 



where d out is the dimension of 7i ut ■ 



Proof. The inequalities (| 1 5|) follow from a more gen- 
eral statement shown in Lemma III. 101 We now show 
that identity (HHJ) follows from (TT5|) and a straightfor- 
ward application of the covariance property: Consider a 
CPTPM <&^B{H in ) -> S(Wout) with property @, and 
let A : B(H m ) -> B(H Q ut) be a unital CPTPM with_the 
same range. Fix some states p' £ <S(W; n ) and a £ S(Hi n ) 
and observe that 

\l-a\ 



maxtr($(p) Q A(cr) i -") > tr($( ff pV) a A( ( j) 1 - Q ) 
p 
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for all g G G. Here we used the covariance of $, 
{gAg^)P — gA® g^ and the cyclicity of the trace in the 
last identity. Taking the average over all g € G gives 



maxtr($(p) Q A(er) 
p 



l-a\ 



1 1 gea J 



> tr $0 



1 

dout 



where {Aj} are the (non-zero) eigenvalues of the opera- 
tor A(ct). Note that for a > 1 the expression /(A) := 
J2i \ 1-Q is minimal if A, ; = l/d out for alH = 1, . . . , d out : 
this follows because /(A x , A 2 , . . .) > f(^4^-, . . •) 



by the convexity of the function x i 



* , and the sym- 



metry of / with respect to permutations of its arguments. 
This minimum is attained if A(er) is completely mixed, or 
(since A is unital) by choosing a — l/di n to be the fully 
mixed state on 7i; n . We conclude that for all p' and a 



) >tr $(p') Q A(I/4 



maxtr($(p) Q A(cr) 
p 



Hence, taking the maximum over p' and the minimum 
over a gives 

minmaxfl a ($(p)||A(er)) = max 

Applying this equation to the cases A = $ (any covariant 
channel is unital) , and A = id equal to the identity chan- 
nel on 7i ut immediately shows that the two quantities 
in (Tl7j)) are indeed equal, and given by 



max£> Q ($(p)||I/d <mt ) 
p 



max — 

p a 



1 



iogd£r t i tr($( P n 



\ogd out -S™ in (<i>) 



as claimed. 



□ 



The last step in the proof of Theorem II. II is to combine 
Lemma 11.21 with the following statement derived in the 
main text. 

Theorem 1.4 (Additivity of %*). Let $ : B(H in ) -> 
B(H ut) be a CPTPM with properties jSJ and (jB| as m 
Theorem Then for all a > 1 

x * ($®») = n ■ X * a ($) = n(\ogd out - S™($)) , 

where d out is the dimension of Tl on t ■ 

In the remainder of this appendix, we fill in the re- 
maining technical details. In particular, we derive (fT 
of Lemma 11.31 as well as Lemma 



q-RENYI QUANTITIES 

We begin with a few properties of a-relative entropies. 

Properties of the a-relative entropy 

If p and a are classical (i.e., commuting), D n (p\\a) 
reduces to the classical a-relative entropy defined in [23( . 
The quantity D„{p\\a) for < a < 1 was previously 
used, e.g., in 2J, |25(. Some of the following statements 
also hold for this regime, however, we concentrate on 
a > 1. We begin by showing positivity of D a (p\\a). 

Lemma II. 5. D a (p\\a) > for all states p,a G S(C d ) 
and a > 0, where equality holds if and only if p = a. 

Proof. Let A = (Ai, . . . , Ad) and p = (pi, . . . , pd) denote 
the eigenvalues of p and a (in some fixed order), respec- 
tively, and let X n = (A^-m, . . . , A,^)) be the reordered 
list, for every permutation ir G Sd- Lemma llV. 1 1 1 implies 
that 



min D a (\*\\p) < D a (p\\a) 



(17) 



Inequality (fT7|) and the fact that the classical a-relative 
entropy is non-negative 23j immediately imply that 
D a (p\\a) > for all p and a. 

Note that the classical a-relative entropy D a (P\\Q) of 
two distributions P and Q vanishes only if P = Q [23| . 
Combining this with (fl"7j) . we conclude that if D a (p\\a) = 
0, then p and a must have the same spectrum A (up to 
some permutation). That is, there is a unitary U such 
that a = UpU^. In particular, we get 

1 = 2("- 1 ) I5 "(' IH 



= £a?a}-«|i^| 2 
= A*n , 

where A and f2 are the matrices defined by Ay 
and flij = \Uij\ 2 , and where we set 



(18) 



A? A 



A-k B := 



Ai j Bij 



Because is a doubly stochastic matrix, it is a convex 
combination 



TT&S d 



(19) 



of permutation matrices acting on C d , by Birkhoff 's the- 
orem (see e.g., Theorem 8.7.1]). From (JTSJl and (TIT)]) , 
we get by linearity and the definition of A 



1 = P{n)A-kw 
= ]T P(tt)2^ d ^ x ^ 
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With the positivity of the classical relative Renyi entropy, 
we conclude that 

D a (\\\\") = , 

for every tt in the support of the distribution P. This 
in turn implies that for any such n, we have A = X n . In 
other words, only permutations tt which permute indices 
corresponding to a fixed eigenvalue among themselves ap- 
pear in (fT9| . We conclude that O, and in particular U are 
block-diagonal, with the different blocks corresponding to 
different eigenvalues, that is, we have 



where the direct sums are over all distinct eigenvalues 
of p and m\ is the multiplicity of A. This shows that 
a = UpU^ = p if D a (p\\a) = 0, as claimed. □ 

Next we consider the relative entropy D a (p\\a) for 
states p, a defined by ensembles: For an ensemble 
{Px, Px}xex, we introduce a corresponding classical- 
quantum state (a cq-state) 

\x\ 

Pxq = ^Px\x){x\® Px eS(Hx®H Q ) (20) 
x=1 ^T^ 

where {la^li^'i is an orthonormal basis of Hx — C'^'. 
Note that this defines a one-to-one correspondence be- 
tween ensembles and cq-states. The following lemma 
shows how the relative entropy of a cq-state pxq and 
a product state px ® oq decomposes into a sum of 
two terms. Only one of the terms depends on oq. It 
is given by the relative entropy of <jq and some state 
MQ = Poi,q(pxq) which is defined in terms of the ensem- 
ble. 

Lemma II. 6. For every cq-state pxq G S(Hx <8> ~Hq), 
define the state pq = p a ,Q(pxQ) G S(Hq) by 



^<x,q{pxq) ■= 



Then 



l/a 



D a (pxQ\\px Ocq) = D a (p X Q\\px ®Pq) + D a (p Q \\o- Q ) 
for all states o~q € S(TLq). 
Proof. Observe that 



D a (p X Q\\px ®o- Q ) 
D a {pxQ\\px ® Pq) 



-^-j-logtr \ y^p x p%a, 
— logtr(Cq) 



a _1— a 

Q 



In particular, we get 

2 D a ( PxQ \\ Px ^ Q ) =tr ^«^- 

= tr(C Q )^tr (f Q ^Q a 
from which the claim follows immediately. □ 

Relating D a to \ a 

We are interested in a-Holevo quantities associated 
with ensembles {p x ,p x }x- Again, it is convenient to con- 
sider the corresponding cq-states (|2"0"1) . We define an a- 
Holevo quantity of a cq-state as the corresponding quan- 
tity of the associated ensemble, that is, 

Xa(PXQ) ■= Xa({Px,Px}x) , 

and we will use ensembles and cq-states interchangeably. 

We now essentially follow the arguments that Schu- 
macher and Westmoreland ll| use to relate the Holevo 
quantity \ to the relative entropy D. Our goal is to 
obtain a similar characterization of x* in terms of D a , 
as expressed by Lemma III. 101 below. As a first step, we 
express the quantity Xa{pxo) by an optimization over 
relative a-entropies of cq-states. 

Lemma II. 7. For any cq-state pxQ £ S(Hx ®Hq), we 
have the identity 

Xc(pxq) = minD a (p X Q\\px ® 0-q) (21) 

where the minimum is attained for the state gq — 
P'ol^q^Pxq) defined in Lemma \II.b\ Furthermore, we have 
for all oq G S(Hq) 

D a {pxq\\px <8> o-q) > Xaipxq) (22) 

with equality if and only if oq = Poi.q(pxq)- 

Proof. The fact that p a ,Q(pxo) achieves the minimum 
on the rhs. of ([2~1]) follows from Lemma [ll.6l and the pos- 
itivity of D a shown in Lemma 111.51 Inserting the defini- 
tion of p a ,Q{pxQ) into the expression on the rhs. of (|21[) 
proves the validity of (|21[) . Inequality (122j) directly fol- 
lows from (|2"Tj) . Lemma Til. 61 and Lemma fll.51 □ 

Note that by reinserting (j2Tj) into the identity given in 
Lemma lll.6| we obtain the identity 

XaiPxg) + D a (p a . Q (pxQ)\\o-Q) = D a (px Q \\px ® o-q) 

(23) 

for all o-q G S(Hq). As a next step, we extend the cq- 
state Pxq by an additional classical symbol and show 
how Xa for the new state relates to the original quantity. 
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Lemma II. 8. Consider the cq-state p X q of Eq. (f20|) and 

let |_L) be a normalized state on Ji x such that (-L\x) = 
for all x £ X. Let p S S(Hq) be arbitrary and consider 
the cq-state 

Pxq = (! - V)PXQ +n\±)(±\®p a , 
for some parameter rj G [0, 1]. Then (for a > 1) 

Xa{p' XQ ) ~ Xa{PXQ) > V (A* Of) 1 1 Pc,Q (p'xQ ) ) ~ XaiPXq)) 

(24) 

Proof. For simplicity, s et Oq = p a ,Q(p' XQ ), where p a ,Q 
is defined as in Lemma fll.61 For a > 1 we then have (by 
the first part of Lemma III. TP 

Xa (Pxq) = D a (p' X Q\\p'x ® o-' Q ) 



logtr ^(1 - r?) • ^Px(z)p^q" Q 
^ logtr 



a fl—a 



> (1-17) 



T logtr 



= (1 -r?)-D a (pxQ||px ®ctq) + ?7L> Q (po 1 1 o-q) 

Here we used the concavity of log to obtain the inequality. 
In particular, bounding D a (p X Q\\px ® Oq) by Xq(pxq) 
using (f2"2")l , we get 



Xa(Pxg) > (1 - v)Xcx(pxq) + r)D a (po\\fj, a< Q(p XQ )) . 
This is the claim (1231). □ 



Proof. Assume that there exists a state po £ -4 such that 

DM\p«,q(Pxq)) > X* a (A) . (25) 

Consider the state p^-g = (1 — r])p X Q + ? /l-L)(-L| <8> po for 
< f] < 1. Observe that this is a .A-cq-state. As 77 — > 
0, we have D a (p \\p atQ (p' XQ )) -> D a (p \\p a . Q (p XQ )) by 
continuity. In particular, by (|25[) . there is a value of 77 
such that 

A*(po||aVq(pxq)) > X* a (A) . 
Combining this with (|24|) leads to the contradiction 

X a {p'xo) > Xaipxq) = X* a (A) ■ 

□ 

We are now ready to prove the following lemma. Note 
that (fT5")) of Lemma 11.31 corresponds to the special case 
where A = {A(p) | p G S(Hi n )} is chosen as the set of 
potential output states of the channel A. 

Lemma 11.10. Let A C S(H) be a set of states and 
a > 1. XTien 



min max D a (p 1 1 cr) < y!(.A) < minmaxD Q (p||cr) 

<jeS(H) p£A a-GA p€A 



(26) 



Proof. Consider an arbitrary A,-cq-state Pxq- We show 
that for any a <E A, the quantity max pg ^ D a (p\\a) is an 
upper bound on any quantity Xa(pxQ)- Indeed, by 
we have 



Optimal ,4-ensembles for Xa 

We now restrict the quantum states p x to be in some 
subset A C S(Hq). For a fixed set A C S(Hq), we 
define an .4-ensemble to be an ensemble {p^, p^j^ where 
Pa; G _4 for all a; G A". An .A-cq-state pxq is a cq-state 
defined by an A-ensemble. Our main focus is on the 
a-Holevo quantity, maximized over all ^-ensembles (or 
cquivalently all .A-cq-states) , that is, the quantity 

x * a (A):= max Xa{{Px, p x }x) ■ 

We can show a maximal distance property similar to 
the one derived in (llj for the Holevo-quantity x an d the 
relative entropy D. 

Lemma II. 9. Let A G S(7i) be some set of states, and 
suppose the A-cq-state p* X Q achieves the maximum ofxa, 
that is, Xa(p*xQ) = Xa(A). Then 

D a (po\\Pa,Q(p X Q)) < X* a (A) for any state p G A . 



Xa{pxQ) < D a (p X Q\\px ® o) 

< max D a (p X Q\\p x ® cr) 

Pxq *4-cq state 



■5— log max trfVp^er 1 a ) 

-1 {p.,p.eA} V x / 



a — 1 pe.A 
= maxD Q (p||cr) . 

pG.4 



logmaxtr^o- 1 -") 



The upper bound in (|26[) follows from this by taking the 
minimum over a £ A. 

We know from Lemma Til. 91 that 

X* a (A) > m&xD a (p\\p a , Q (p XQ )) , (27) 

where P* X q is the state that achieves the optimum 
in X*(A)- Observe that Pa,Q{p* X Q) G S(TL) is a state 
(but not necessarily an element of A). Therefore, the 
lower bound in (f2"6) follows from (|2T|) . □ 
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PROOF OF LEMMA O 

We separate the two parts of the proof of this lemma, 
first addressing the general bound on the error exponent 
in terms of the regularized Holevo quantities. 

Proof of part (Tj). Consider a CPTPM $ : B{H in ) -> 
B(H ut)- Fix a set of states {p x } 2 x= ^ C SiHfJ 1 ) and a 
POVM {E X }1" =1 on Hf" t , and consider the success prob- 
ability Pf ucc {n,R) defined by ©. Let a x = $® n (p x ). 
Since y i— > y 1 / 01 is operator monotone for a > 1 (see 
e -g-> |27|, Theorem V.1.9]), we have the operator inequal- 
ity 



1/a 



for all x. Inserting this into ((2]) gives 
P* oc (n,i*)<2-"«^ 



< 2^ nii tr 




= 2" 

< 2 £ 



r i(-«H+ XQ ({2-" n : <T x } x ) 



Here we used the operator inequality E x < I for POVM 
elements in the first step and the definition of Xa applied 
to the ensemble defined by {a x = $® n (p x )} x together 
with the uniform distribution on {1, . . . , 2 nR }. Since 
both the set of states and the POVM were arbitrary, 
the claim follows from definition (fT4"|) . □ 



Proof of part @ . Substituting a 
s < gives 



l/(s + 1) for —I < 



a-1 



(R-Xa{{Px,Hp x )}) = -sR+E (s,{p x ,®{p x )}) , 

(28) 



where the maximum and minimum are over all proba- 
bility distributions {q x }. Let {p x ,p x } x be the ensemble 
which achieves the maximum in the definition of x*(&)i 
and set a x :— &(p x ). With (|2"5j) and the previous state- 
ment, we conclude that for all R > X*{®)> there exists 
(3 > 1 such that 



a-1 



(R- X * a m>0 Vae(l,/3). 



□ 



This is part ([2]) of the claim since a > 1. 



AN ADDITIONAL TECHNICAL LEMMA 



Lemma IV.ll. Let A > and B > be two pos- 
itive semi-definite operators on <C dxd with eigenvalues 
X A = (\ A ,...,X A ) and X B = (Af ,...,Af). Then there 
exist permutations 7r m i n , 7r max £ Sd such that for all uni- 
taries U 



A \B 

(3) A i 



< tT(UAU*B) < >4 n 



\ B 



where 



Proof. Let v = (vi, . . . , Vd) be the vector of diagonal en- 
tries of the matrix UAU' in a basis consisting of normal- 
ized eigenvectors of B. A well-known result by Schur (see 
e.g., [26|, Theorem 4.3.26]) states that the vector of diag- 
onal entries of a nonnegative matrix majorizes the vector 
of its eigenvalues. Applied to UAU^ , we conclude that v 
majorizes X A . Another classical theorem by Hardy, Lit- 
tlewood and Polya (see e.g., [H, Theorem 4.3.33]) then 
shows that there exists a probability distribution P (de- 
pending on U) over the group of permutations Sd such 
that 



Tves d 



E (s,{p x ,(J x } x ) := s ■ Xi/( s +i){{Px,<J x } x ) . 

In 0, Lemma 3], it is shown that for all families of 
states {a x } x , and all R > maxj, i } i x({fe, °~x} x ), we have 

3t < : -sR + min E (s, {q x , a x } x ) > Vs e (t, 0) , 

{<?x}x 



The claim now follows by observing that tr (U AU ' B) = 
v ■ X B , where v ■ X B denotes the Euclidean inner product 
of vectors v and X B . □ 



